Goto

Collaborating Authors

 construction site


Automated Hazard Detection in Construction Sites Using Large Language and Vision-Language Models

Sahraoui, Islem

arXiv.org Artificial Intelligence

This thesis explores a multimodal AI framework for enhancing construction safety through the combined analysis of textual and visual data. In safety-critical environments such as construction sites, accident data often exists in multiple formats, such as written reports, inspection records, and site imagery, making it challenging to synthesize hazards using traditional approaches. To address this, this thesis proposed a multimodal AI framework that combines text and image analysis to assist in identifying safety hazards on construction sites. Two case studies were consucted to evaluate the capabilities of large language models (LLMs) and vision-language models (VLMs) for automated hazard identification.The first case study introduces a hybrid pipeline that utilizes GPT 4o and GPT 4o mini to extract structured insights from a dataset of 28,000 OSHA accident reports (2000-2025). The second case study extends this investigation using Molmo 7B and Qwen2 VL 2B, lightweight, open-source VLMs. Using the public ConstructionSite10k dataset, the performance of the two models was evaluated on rule-level safety violation detection using natural language prompts. This experiment served as a cost-aware benchmark against proprietary models and allowed testing at scale with ground-truth labels. Despite their smaller size, Molmo 7B and Quen2 VL 2B showed competitive performance in certain prompt configurations, reinforcing the feasibility of low-resource multimodal systems for rule-aware safety monitoring.


BIM-Discrepancy-Driven Active Sensing for Risk-Aware UAV-UGV Navigation

Mojtahedi, Hesam, Akhavian, Reza

arXiv.org Artificial Intelligence

This paper presents a BIM-discrepancy-driven active sensing framework for cooperative navigation between unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) in dynamic construction environments. Traditional navigation approaches rely on static Building Information Modeling (BIM) priors or limited onboard perception. In contrast, our framework continuously fuses real-time LiDAR data from aerial and ground robots with BIM priors to maintain an evolving 2D occupancy map. We quantify navigation safety through a unified corridor-risk metric integrating occupancy uncertainty, BIM-map discrepancy, and clearance. When risk exceeds safety thresholds, the UAV autonomously re-scans affected regions to reduce uncertainty and enable safe replanning. Compared to frontier-based exploration, our approach achieves similar uncertainty reduction in half the mission time. These results demonstrate that integrating BIM priors with risk-adaptive aerial sensing enables scalable, uncertainty-aware autonomy for construction robotics. Introduction Construction sites are among the most dynamic, unstructured, and safety-critical environments for autonomous robots. Unlike factory floors or structured indoor spaces, these environments are marked by continual change. New buildings are erected, materials are relocated, and the movement of heavy machinery and workers can be unpredictable. Such conditions make autonomous navigation particularly challenging. Construction 4.0 [1], emphasizing automation and digitalization, is moving robotics from trial phases to regular use on construction sites.


BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring

Bikandi-Noya, Asier, Shaheer, Muhammad, Bavle, Hriday, Jevanesan, Jayan, Voos, Holger, Sanchez-Lopez, Jose Luis

arXiv.org Artificial Intelligence

Augmented reality (AR) applications for construction monitoring rely on real-time environmental tracking to visualize architectural elements. However, construction sites present significant challenges for traditional tracking methods due to featureless surfaces, dynamic changes, and drift accumulation, leading to misalignment between digital models and the physical world. This paper proposes a BIM-aware drift correction method to address these challenges. Instead of relying solely on SLAM-based localization, we align ``as-built" detected planes from the real-world environment with ``as-planned" architectural planes in BIM. Our method performs robust plane matching and computes a transformation (TF) between SLAM (S) and BIM (B) origin frames using optimization techniques, minimizing drift over time. By incorporating BIM as prior structural knowledge, we can achieve improved long-term localization and enhanced AR visualization accuracy in noisy construction environments. The method is evaluated through real-world experiments, showing significant reductions in drift-induced errors and optimized alignment consistency. On average, our system achieves a reduction of 52.24% in angular deviations and a reduction of 60.8% in the distance error of the matched walls compared to the initial manual alignment by the user.


BIM Informed Visual SLAM for Construction Monitoring

Bikandi-Noya, Asier, Fernandez-Cortizas, Miguel, Shaheer, Muhammad, Tourani, Ali, Voos, Holger, Sanchez-Lopez, Jose Luis

arXiv.org Artificial Intelligence

Abstract-- Simultaneous Localization and Mapping (SLAM) is a key tool for monitoring construction sites, where aligning the evolving as-built state with the as-planned design enables early error detection and reduces costly rework. LiDAR-based SLAM achieves high geometric precision, but its sensors are typically large and power-demanding, limiting their use on portable platforms. Visual SLAM offers a practical alternative with lightweight cameras already embedded in most mobile devices. However, visually mapping construction environments remains challenging: repetitive layouts, occlusions, and incomplete or low-texture structures often cause drift in the trajectory map. T o mitigate this, we propose an RGB-D SLAM system that incorporates the Building Information Model (BIM) as structural prior knowledge. Instead of relying solely on visual cues, our system continuously establishes correspondences between detected wall and their BIM counterparts, which are then introduced as constraints in the back-end optimization. The proposed method operates in real time and has been validated on real construction sites, reducing trajectory error by an average of 23.71% and map RMSE by 7.14% compared to visual SLAM baselines.


Automating construction safety inspections using a multi-modal vision-language RAG framework

Wang, Chenxin, Shamsabadi, Elyas Asadi, Chen, Zhaohui, Shen, Luming, Fini, Alireza Ahmadian Fard, Dias-da-Costa, Daniel

arXiv.org Artificial Intelligence

Conventional construction safety inspection methods are often inefficient as they require navigating through large volume of information. Recent advances in large vision-language models (LVLMs) provide opportunities to automate safety inspections through enhanced visual and linguistic understanding. However, existing applications face limitations including irrelevant or unspecific responses, restricted modal inputs and hallucinations. Utilisation of Large Language Models (LLMs) for this purpose is constrained by availability of training data and frequently lack real-time adaptability. This study introduces SiteShield, a multi-modal LVLM-based Retrieval-Augmented Generation (RAG) framework for automating construction safety inspection reports by integrating visual and audio inputs. Using real-world data, SiteShield outperformed unimodal LLMs without RAG with an F1 score of 0.82, hamming loss of 0.04, precision of 0.76, and recall of 0.96. The findings indicate that SiteShield offers a novel pathway to enhance information retrieval and efficiency in generating safety reports.


Safety Assessment of Scaffolding on Construction Site using AI

Prabhu, Sameer, Patwardhan, Amit, Karim, Ramin

arXiv.org Artificial Intelligence

In the construction industry, safety assessment is vital to ensure both the reliability of assets and the safety of workers. Scaffolding, a key structural support asset requires regular inspection to detect and identify alterations from the design rules that may compromise the integrity and stability. At present, inspections are primarily visual and are conducted by site manager or accredited personnel to identify deviations. However, visual inspection is time-intensive and can be susceptible to human errors, which can lead to unsafe conditions. This paper explores the use of Artificial Intelligence (AI) and digitization to enhance the accuracy of scaffolding inspection and contribute to the safety improvement. A cloud-based AI platform is developed to process and analyse the point cloud data of scaffolding structure. The proposed system detects structural modifications through comparison and evaluation of certified reference data with the recent point cloud data. This approach may enable automated monitoring of scaffolding, reducing the time and effort required for manual inspections while enhancing the safety on a construction site.


Building Information Models to Robot-Ready Site Digital Twins (BIM2RDT): An Agentic AI Safety-First Framework

Akhavian, Reza, Amani, Mani, Mootz, Johannes, Ashe, Robert, Beheshti, Behrad

arXiv.org Artificial Intelligence

ABSTRACT The adoption of cyber-physical systems and jobsite intelligence that connects design models, real-time site sensing, and autonomous field operations can dramatically enhance digital management in the Architecture, Engineering, and Construction (AEC) industry. This paper introduces BIM2RDT (Building Information Models to Robot-Ready Site Digital T wins), an agentic artificial intelligence (AI) framework designed to transform static Building Information Modeling (BIM) into dynamic, robot-ready digital twins (DTs) that prioritize safety during construction execution. The framework bridges the gap between pre-existing BIM data and real-time site conditions by integrating three key data streams: geometric and semantic information from BIM models, real-time activity data from IoT sensor networks, and visual-spatial data collected by quadruped robots during site traversal. The methodology introduces Semantic-Gravity ICP (SG-ICP), a novel point cloud registration algorithm that leverages large language model (LLM) reasoning. This creates an intelligent feedback loop where robot-collected data updates the DT, which in turn optimizes paths for subsequent missions. The framework employs YOLOE open-vocabulary object detection and Shi-Tomasi corner detection to identify and track construction elements while using BIM geometry as robust a priori maps. Major findings from experiments demonstrate SG-ICP's superiority over standard ICP, achieving RMSE reductions of 64.3%-88.3% in alignment across varied scenarios with occluded or sparse features, ensuring physically plausible orientations. HA V integration triggers real-time warnings and tasks upon exceeding exposure limits, enhancing compliance with such standards as ISO 5349-1. PRACTICAL APPLICATIONS Construction sites are becoming increasingly complex with the introduction of new technologies such as reality capture equipment and robots, requiring better tools to streamline adoption, avoid tool sprawl, and ensure worker safety. This research introduces a system that combines robots, smart sensors, and building information modeling (BIM) data to create a "digital twin": an up-to-date virtual copy of a construction site's geometries and safety information. The system uses quadruped robots equipped with cameras and sensors to autonomously walk through construction sites, automatically detecting and tracking objects like equipment, materials, and temporary structures. Unlike traditional approaches that start from scratch, this method leverages existing BIM data as a foundation, making the robots more accurate and efficient at understanding their surroundings. Besides geometric site updates, safety information is also presented in the updated digital twin.


Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task

Chharia, Aviral, Ren, Tianyu, Furuhata, Tomotake, Shimada, Kenji

arXiv.org Artificial Intelligence

Recognizing safety violations in construction environments is critical yet remains underexplored in computer vision. Existing models predominantly rely on 2D object detection, which fails to capture the complexities of real-world violations due to: (i) an oversimplified task formulation treating violation recognition merely as object detection, (ii) inadequate validation under realistic conditions, (iii) absence of standardized baselines, and (iv) limited scalability from the unavailability of synthetic dataset generators for diverse construction scenarios. To address these challenges, we introduce Safe-Construct, the first framework that reformulates violation recognition as a 3D multi-view engagement task, leveraging scene-level worker-object context and 3D spatial understanding. We also propose the Synthetic Indoor Construction Site Generator (SICSG) to create diverse, scalable training data, overcoming data limitations. Safe-Construct achieves a 7.6% improvement over state-of-the-art methods across four violation types. We rigorously evaluate our approach in near-realistic settings, incorporating four violations, four workers, 14 objects, and challenging conditions like occlusions (worker-object, worker-worker) and variable illumination (back-lighting, overexposure, sunlight). By integrating 3D multi-view spatial understanding and synthetic data generation, Safe-Construct sets a new benchmark for scalable and robust safety monitoring in high-risk industries. Project Website: https://Safe-Construct.github.io/Safe-Construct


Towards Edge-Based Idle State Detection in Construction Machinery Using Surveillance Cameras

Küpers, Xander, Brinke, Jeroen Klein, Bemthuis, Rob, Incel, Ozlem Durmaz

arXiv.org Artificial Intelligence

The construction industry faces significant challenges in optimizing equipment utilization, as underused machinery leads to increased operational costs and project delays. Accurate and timely monitoring of equipment activity is therefore key to identifying idle periods and improving overall efficiency. This paper presents the Edge-IMI framework for detecting idle construction machinery, specifically designed for integration with surveillance camera systems. The proposed solution consists of three components: object detection, tracking, and idle state identification, which are tailored for execution on resource-constrained, CPU-based edge computing devices. The performance of Edge-IMI is evaluated using a combined dataset derived from the ACID and MOCS benchmarks. Experimental results confirm that the object detector achieves an F1 score of 71.75%, indicating robust real-world detection capabilities. The logistic regression-based idle identification module reliably distinguishes between active and idle machinery with minimal false positives. Integrating all three modules, Edge-IMI enables efficient on-site inference, reducing reliance on high-bandwidth cloud services and costly hardware accelerators. We also evaluate the performance of object detection models on Raspberry Pi 5 and an Intel NUC platforms, as example edge computing platforms. We assess the feasibility of real-time processing and the impact of model optimization techniques.


Enhancing Construction Site Analysis and Understanding with 3D Segmentation

Vasanthawada, Sri Ramana Saketh, Liu, Pengkun, Tang, Pingbo

arXiv.org Artificial Intelligence

Monitoring construction progress is crucial yet resource-intensive, prompting the exploration of computer-vision-based methodologies for enhanced efficiency and scalability. Traditional data acquisition methods, primarily focusing on indoor environments, falter in construction site's complex, cluttered, and dynamically changing conditions. This paper critically evaluates the application of two advanced 3D segmentation methods, Segment Anything Model (SAM) and Mask3D, in challenging outdoor and indoor conditions. Trained initially on indoor datasets, both models' adaptability and performance are assessed in real-world construction settings, highlighting the gap in current segmentation approaches due to the absence of benchmarks for outdoor scenarios. Through a comparative analysis, this study not only showcases the relative effectiveness of SAM and Mask3D but also addresses the critical need for tailored segmentation workflows capable of extracting actionable insights from construction site data, thereby advancing the field towards more automated and precise monitoring techniques.